Home

Tech

Modding

Gaming

Forum

Login Register

ATI Radeon HD 5870 Architecture Analysis

Item: ATI Radeon HD 5870 Architecture Analysis
Author: Tim Smalley

Written by Tim Smalley

September 30, 2009 | 17:58

Tags: #5870 #analysis #architecture #compute #cypress #directx11 #dx11 #evaluation #feature #g80 #geforce #gt200 #hd #opencl #performance #radeon #review #rv870

Companies: #amd #ati #nvidia

Texture Sampling

With RV770, AMD made some big changes to its cache hierarchy and texture sampling architecture. Those are very much still apparent in Cypress, but there have been some tweaks made in the general architectural upscaling.

I mentioned earlier that there are 80 texture units across the entire Cypress GPU in its full fat form, but that only covers the chip's texture address capabilities. As a result, there's another way to look at the texture units when you take all of their capabilities into account and to save confusion, I've used slightly different terminology.

Each core has a texture sampler, which can handle four addresses, 16 FP32 samples and four FP32 filters per clock.

When everything is added up, the chip can now fetch 320 32-bit textures per clock (a rate of 272 billion 32-bit texture fetches per second) and bilinear filter 80 textures per clock (68 billion texels per second).

$Cypress's texture cache hierarchy$

Cypress's texture cache hierarchy

There's support for several new block texture formats which were originally proposed by AMD and have been adopted for use with both 8-bit per channel and FP16 HDR texture formats, with the latter being a new addition with DirectX 11. Compression ratios of up to 6:1 are now possible thanks to the new compression algorithms, and DX11 also increases the maximum texture resolution to 16k by 16k pixels.

The L1 texture cache has remained unchanged in terms of size and associativity - it still has effectively unlimited access per clock cycle - but the increased core count means that the number of texture caches has doubled. There are now twenty 8KB L1 texture caches, meaning a total of 160KB L1 texture cache GPU-wide. The four L2 caches, which are associated with each of the four memory controllers, have doubled in capacity as well and are now 128KB each, meaning a total of 512KB across the GPU.

Texture bandwidth has also been bolstered, with texture fetches from L1 cache happening at up to 1TB/sec (one terabyte per second) - that's more than double the L1 texture cache bandwidth available in RV770. I said so earlier, but it's worth reiterating again - that's a phenomenal amount of bandwidth. What's more, bandwidth between L1 and L2 caches has been increased to 435GB/sec from 384GB/sec on RV770 - another impressive figure.

ATI Radeon HD 5870 1GB
ATI Radeon HD 4890 1GB
Nvidia GeForce GTX 285 1GB
Nvidia GeForce 8800 GTX 768MB
ATI Radeon HD 3870 512MB

- 1882.0
- 886.7
- 723.1
- 512.9
- 347.1

0

500

1000

1500

2000

Gtexels/sec

Single Texture

ATI Radeon HD 5870 1GB
Nvidia GeForce GTX 285 1GB
ATI Radeon HD 4890 1GB
Nvidia GeForce 8800 GTX 768MB
ATI Radeon HD 3870 512MB

- 16411.6
- 65599.8
- 68000.0
- 16315.1
- 48682.6
- 51840.0
- 13138.4
- 26534.9
- 34000.0
- 6915.1
- 18033.4
- 18400.0
- 6454.1
- 12279.6
- 12400.0

0

10000

20000

30000

40000

50000

60000

70000

Mtexels/sec

Single Texture
Multi Texture
Theoretical Peak

ATI Radeon HD 5870 Architecture Analysis Texture Sampling

See previous page for test kit

While 3DMark Vantage incorrectly reports the actual throughput, it appears to scale correctly compared to the other texture filtering tests we've done; the graph shows the relative performance across the various GPU architecture generations using a DX10 texture format. Our 3DMark06 multi-texturing test shows that the HD 5870 manages to get very close to its theoretical peak texture sampling rate, despite the massive amount of texturing horsepower available.

This is unlike RV770/790, which didn't get to 80 per cent of its theoretical peak throughput and is more in line with what we've come to expect from Nvidia's hardware. It's likely thanks to the hugely improved bandwidth to the L1 texture cache - D3D RightMark's texture fillrate test effectively confirms this.

Discuss this in the forums

ATI Radeon HD 5870 Architecture Analysis

Texture Sampling

D3D10 - 3DMark Vantage: Texture Fillrate

Multi-Texture Fillrate Test, Default Settings

D3D9.0c - 3DMark06: Texture Fillrate

Single and Multi-Texture Fillrate Tests, Default Settings

RELATED ARTICLES

MSI MPG Velox 100R Chassis Review

Site Links

Popular Companies

let's get social

ATI Radeon HD 5870 Architecture Analysis

Texture Sampling

D3D10 - 3DMark Vantage: Texture Fillrate

Multi-Texture Fillrate Test, Default Settings

D3D9.0c - 3DMark06: Texture Fillrate

Single and Multi-Texture Fillrate Tests, Default Settings

RELATED ARTICLES

How to Overclock Your Graphics Card

Asus Radeon HD 5870/G V2 Review

Asus Matrix Radeon HD 5870 Graphics Card Review

MSI MPG Velox 100R Chassis Review

Site Links

Popular Companies

let's get social